KMID : 1022420180100010033
|
|
Phonetics and Speech Sciences 2018 Volume.10 No. 1 p.33 ~ p.38
|
|
Multi-resolution DenseNet based acoustic models for reverberant speech recognition
|
|
Park Sun-Chan
Jeong Yong-Won Kim Hyung-Soon
|
|
Abstract
|
|
|
Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.
|
|
KEYWORD
|
|
convolutional neural network, DenseNet, multi-resolution, speech recognition
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|